Video Content-Based Retrieval Techniques

نویسنده

  • Waleed E. Farag
چکیده

The increasing use of multimedia streams nowadays necessitates the development of efficient and effective methodologies and systems for manipulating databases storing these streams. These systems have various areas of application such as video-ondemand and digital libraries. The importance of video content-based retrieval (CBR) systems motivates us to explain their basic components in this chapter and shed light on their underlying working principles. In general, a content-based retrieval system of video data consists of the following four stages: (1) Video Shot Boundary Detection, (2) Key Frames (KFs) selection, (3) features extraction (from selected KFs), and (4) retrieval stage (where similarity matching operations are performed). Each one of the above stages will be reviewed and expounded based on our experience in building a Video Content-based Retrieval (VCR) system that has been fully implemented from scratch in JAVA Language (2002). Moreover, current research directions and outstanding problems will be discussed for each stage in the context of our VCR system. Video Content-Based Retrieval Techniques 113 Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. INTRODUCTION Recently, multimedia applications are undergoing explosive growth due to the monotonic increase in the available processing power and bandwidth. This incurs the generation of large amounts of media data that need to be effectively and efficiently organized and stored. While these applications generate and use vast amounts of multimedia data, the technologies for organizing and searching them are still in their infancy. These data are usually stored in multimedia archives utilizing search engines to enable users to retrieve the required information. Searching a repository of data is a well-known important task whose effectiveness determines, in general, the success or failure in obtaining the required information. A valuable experience that has been gained by the explosion of the web is that the usefulness of vast repositories of digital information is limited by the effectiveness of the access methods (Brunelli, Mich, & Modena, 1999). In a nutshell, the above statement emphasizes the great importance of providing effective search techniques. For alphanumeric databases many portals (Baldwin, 2000) such as google, yahoo, msn, and excite have become widely accessible via the web. These search engines provide their users a keyword-based search model in order to access the stored information but the inaccurate search results of these search engines is a known drawback. For multimedia data, describing unstructured information (such as video) using textual terms is not an effective solution because they cannot be uniquely described by a number of statements. That is mainly due to the fact that human opinions vary from one person to another (Ahanger & Little, 1996), so that two persons may describe a single image by totally different statements. Therefore, the highly unstructured nature of multimedia data renders keyword-based search techniques inadequate. Video streams are considered the most complex form of multimedia data because they contain almost all other forms such as images and audio in addition to their inherent temporal dimension. The central role of video data among all other multimedia forms motivated us to focus in this chapter on proposing an effective search paradigm for that particular media. One promising solution that enables searching multimedia data, in general, and video data in particular is the concept of content-based search and retrieval. The basic idea is to access video data by their contents; for example, using one of the visual content features. Realizing the importance of content-based searching, researchers have started investigating the issue and proposing creative solutions (Chang, 1998). Most of the proposed video indexing and retrieval prototypes have the following two major phases (Flinkner et al., 1995): • Database population phase, consisting of the following steps: Shot boundary detection. The purpose of this step is to partition a video stream into a set of meaningful and manageable segments (Idris & Panchanathan, 1997), which then serve as the basic units for indexing. Key frames selection. This step attempts to summarize the information in each shot by selecting representative frames that capture the salient characteristics of that shot. Extracting low-level features from key frames. During this step, a number of lowlevel spatial features (color, texture, etc.) are extracted in order to use them as indices to key frames and hence to shots. Temporal features (e.g., object motion) can be used too. 114 Farag & Abdel-Wahab Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. • The retrieval phase: In this stage, a query is presented to the system that in turn performs similarity matching operations and returns similar data (if found) back to the user. One technique that is commonly used to present queries to video databases is QBE (Query By Example) (Yoshitaka & Ichikawa, 1999). In this technique, an image or a video clip is presented to the system and the user requests the system to retrieve similar items. In this chapter, we present a new paradigm for solving the problem of content-based indexing and retrieval of video data. The proposed system tries to achieve its objectives by developing novel and effective approaches to tackle the problem at hand. In spite of the fact that a number of video indexing and retrieval prototype systems have been introduced by other researchers, we believe there are still essential problems that require better solutions. These solutions should aim at improving the reliability, efficiency, and effectiveness of video retrieval systems. The first shortcoming of most of the current video retrieval systems is the lack of reliability of the shot boundary detection stage (Hanjalic & Zhang, 1999). The multiplicity of video streams, their varying contents, and the huge amounts of data involved are some obstacles against the design of robust and efficient techniques for detecting shot boundaries. Moreover, the lack of reliability of this particular stage not only affects its performance but also impacts the performance of the whole retrieval system. That is because the output of this stage plays a significant role in determining the results of all subsequent stages. The developed system introduces a novel paradigm to detect scene changes that is both reliable and efficient; thus, solving the problems exhibited in other shot boundary detection methodologies. The second problem that will be addressed is how to devise an efficient algorithm to abstract the large amount of information found in each segmented shot. Most of the current approaches are either oversimplified so that they cannot perform the right choice or too complex that it renders them unsuitable for online processing. Two efficient algorithms to select key frames are introduced with the goal of avoiding the aforementioned shortcomings. Deriving content indexes from the selected key frames is the next stage in which we use two low-level features (color and texture) as the basic components of the generated metadata. These metadata will be used in any further processing or similarity matching operations. The effectiveness of the retrieval stage is the last issue we are concerned with. This problem is so critical in determining the success of a content-based retrieval system for video data. Currently, retrieval systems overlook a very essential fact while measuring the similarity of video data. That fact can be stated as similarity matching has significance only if it can model what humans do. On the contrary of other techniques, the proposed retrieval system introduces a new similarity matching approach that attempts to model the way humans perceive multimedia data and judge their similarity. The developed retrieval module handles a number of shortcomings in current prototypes; thus, improving the overall performance of the retrieval system. In a nutshell, the main objective of this chapter is to explain the working principles of a novel video content-based indexing and retrieval system whose main task is to endow its users with an easy-to-use, effective, and efficient scheme for retrieving the required information. Video Content-Based Retrieval Techniques 115 Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. BACKGROUND In this section, we review some related work that aims at solving the problem of CBR (Content-Based Retrieval) of video data, highlighting the challenges and the inadequacies of current approaches. This review will cover all the stages of the developed CBR system. To start with, we first give a quick overview of the concept of indexing and retrieving digital images based on their contents (Rui, Huang, & Chang, 1999) due to its relevance to the problem at hand. The ideal way to describe the content of an image is in terms of its objects. However, object recognition in a general image database is a very hard problem. Instead, researchers extract low-level features (color, texture, shape, structure, spatial relationships among objects, etc.) to describe the content of an image. These features are then employed as indices to the image. Extracting these features and storing them into the database constitute the first stage of indexing images by content. The main functionality of the second stage, the retrieval system, is to analyze the presented query image in order to extract the same features from it. After that, the retrieval system performs similarity matching operations among extracted features of the query and those stored into the database. A number of systems have been proposed in the literature that, in general, apply the above-mentioned approach to access and browse images based on their contents (Abbadi, 2000; Abdel-Mottaleb, Dimitrova, Desai, & Martino, 1996; Delp, 1999; Flinkner et al., 1995; Gupta & Jain, 1997; Hsu, Chua, & Pung, 1995; Krishnamachari & Adbel-Mottaleb, 1999; Ma & Manjunath, 1997; Petland, Picard, & Sclaroff, 1994; QBIC System, 2000; Sheneier & Abdel-Mottaleb, 1996; Smith & Chang, 1996; Stanford Digital Library Group, 1995; Virage Video Indexing System, 2000). On the other hand, by looking at indexing and retrieving of video data, the problem becomes much more complicated simply because video data have both temporal and spatial dimensions (Idris & Panchanathan, 1997). In the following, we will survey some related work proposed by various researchers for each stage of a video retrieval system. Related Work on Video Segmentation Video data are rich sources of information and in order to model these data, the information content of the data has to be analyzed. Video analysis is divided into two stages (Rui, Huang, & Mchrotra, 1998a). The first stage is to divide the video sequence into a group of shots (shot boundary detection) while the second stage is the process of selecting key frame(s) to represent each shot. Generally, there are two trends to segment video data. The first one works in the uncompressed domain while the other one works in the compressed domain (Chang, 1995). The first trend is discussed first. Methods in the uncompressed domain can be broadly classified into five categories, template-matching, histogram-based, twin comparison, block-based, and modelbased techniques. In template matching techniques (Hampapur, Jain, & Weymouth, 1994; Zhang, Kankanhalli, Smoliar, & Tan, 1993), each pixel at the spatial location (i,j) in frame fm is compared with the pixel at the same location in frame fn and a scene change is declared whenever the difference function exceeds a pre-specified threshold. Using this metric it becomes difficult to distinguish between a small change in a large area and a large change in a small area. Therefore, template-matching techniques are sensitive to noise, object motion and camera operations. 116 Farag & Abdel-Wahab Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. One example of the use of histogram-based techniques is presented in Tonomura (1991) where the histogram of a video frame and a difference function (S) between fn and fm are calculated. If S is greater than a threshold, a cut is declared. That technique uses Equation (1) to calculate the difference function and declare a cut if the function is greater than a threshold.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Content based Video Retrieval, Classification and Summarization: The State-of-the-Art and the Future

This chapter provides an overview of different video content modeling, retrieval and classification techniques employed in existing content-based video indexing and retrieval (CBVIR) systems. Based on the modeling requirements of a CBVIR system, we analyze and categorize existing modeling approaches. Starting with a review of video content modeling and representation techniques, we study view-i...

متن کامل

Social Video Retrieval: Research Methods in Controlling, Sharing, and Editing of Web Video

Content-based video retrieval has been a very efficient technique with new video content, but it has not regarded the increasingly dynamic interactions between users and content. We present a comprehensive survey on user-based techniques and instrumentation for social video retrieval researchers. Community-based approaches suggest there is much to learn about an unstructured video just by analy...

متن کامل

A Comprehensive Review of Image Retrieval Based On Example Video Clip

In the recent years, with the usage of internet, there has been large amount of data resides on the web. Everyone is interested for accurate and fast retrieval search engines that retrieve images. This paper tries to present a comprehensive review and differentiate the various problems of image retrial techniques. This paper presents a survey of the most popular image retrieval techniques with ...

متن کامل

A fuzzy video content representation for video summarization and content-based retrieval

In this paper, a fuzzy representation of visual content is proposed, which is useful for the new emerging multimedia applications, such as content-based image indexing and retrieval, video browsing and summarization. In particular, a multidimensional fuzzy histogram is constructed for each video frame based on a collection of appropriate features, extracted using video sequence analysis techniq...

متن کامل

Cobra: A Content-Based Video Retrieval System

An increasing number of large publicly available video libraries results in a demand for techniques that can manipulate the video data based on content. In this paper, we present a content-based video retrieval system called Cobra. The system supports automatic extraction and retrieval of high-level concepts (such as video objects and events) from raw video data. It benefits from using domain k...

متن کامل

Fast Video Shot Retrieval with Sequence Trace in the Principal Component Space

Content-based video retrieval technology holds the key to the efficient management and sharing of video content from different sources, across different platforms and over different communication channels. In this work we present a fast retrieval algorithm based on matched filtering of the video sequence trace characteristics in the principal component space. Techniques to combat scale variance...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005